Generic Nekhoroshev theory without 
small divisors. 

Abed Bounemoura * and Laurent Niederman t 
November 9, 2010 

Dedicated to the memory of N.N. Nekhoroshev (1946-2008) 

Abstract 

In this article, we present a new approach of Nekhoroshev's the- 
ory for a generic unperturbed Hamiltonian which completely avoids 
small divisors problems. The proof is an extension of a method in- 
troduced by P. Lochak, it combines averaging along periodic orbits 
with simultaneous Diophantine approximation and uses geometric ar- 
guments designed by the second author to handle generic integrable 
Hamiltonians. This method allows to deal with generic non-analytic 
Hamiltonians and to obtain new results of generic stability around lin- 
early stable tori. 

1 Introduction 

1. In this article, we are concerned with the stability properties of near- 
integrable analytic Hamiltonian systems. According to a classical theorem 
of Liouville- Arnold (see [AKN06J), such systems are locally governed by a 
Hamiltonian of the form 

(H(6,I) = h(I) + f(6,I) 
\\f\<e«l 

where (6,1) G T n x M. n are action-angle coordinates for h and / is a small 
perturbation in some suitable topology. For the integrable system, that is 
when / = 0, the action variables of solutions are trivially constant for all 
times, but when / ^ they are no longer constant of motions and we are 
interested in studying their evolution for long intervals of time. 
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2. But first it is important to understand the integrable case. When H = h 
depends only on the action variables, as the latter are constant for all times, 
the phase space is trivially foliated into invariant tori 7} = T n x {/o} 5 f° r 
Iq G W 1 , and on each torus 7} the flow is quasi-periodic with frequency 
vector wo = V/i(Jo) £ M n . The dynamics of such a flow is completely 
understood and depends on the frequency vector u;o, more precisely on its 
resonant module 

M(co ) = {keZ n \ k.ujQ = 0}, 

where the dot denotes the Euclidean scalar product. If A4(ojq) is trivial, 
then the dynamics is minimal and uniquely ergodic. Otherwise, we have a 
relation of the form k.ujQ = for some k S Z n \ {0}, which is usually called 
a resonance, and denoting by m the rank of the torus 7o splits into 

a continuous m-parameter family of invariant sub-tori of dimension n — m, 
on which the dynamics is minimal and uniquely ergodic. These are called 
resonant tori, and in case of maximal resonances (i.e. m = n — 1 if h does 
not have critical points), the tori are foliated into periodic orbits. Under 
some non-degeneracy assumption on h, both resonant and non-resonant tori 
form a dense subset of the phase space. 

3. Returning to the perturbed system, since Poincare we know that reso- 
nant tori do not survive (actually he proved that for a periodic tori, generi- 
cally only a finite number of periodic orbits persist). But it was a remarkable 
idea of Kolmogorov ( }Kol54| ) to focus on non-resonant tori to prove that a 
set of large measure of invariant tori survives under some regularity and non- 
degeneracy assumptions. This has now become a rich and vast subject called 
KAM theory (see [PosOlj . jdlLOl] or [Bos86| for some nice introductions on 
this theory). Such tori persist in a \fe- neighbourhood of the unperturbed 
ones and therefore for a set of large measure of initial conditions, the vari- 
ation of the actions is of order yfe for all time. But on the other hand, 
this set of KAM tori is typically a Cantor family (hence with no interior) 
and the theory gives no information on the complement, except when n = 2 
where these two-dimensional invariant tori disconnect the three-dimensional 
energy level leaving all solutions stable for all time. However for n > 3, it 
is still possible to find solutions for which the variation of the action com- 
ponents is of order one. A proof of this fact was outlined by Arnold in his 
famous paper ( |Arn64| ) where he proposed a mechanism to produce exam- 
ples of near-integrable Hamiltonian systems where such a drift occurs no 
matter how small the perturbation is. This phenomenon is usually referred 
to Arnold diffusion. 

4. Hence for n > 3, results of stability for near-integrable Hamiltonian 
systems which are valid for an open set of initial conditions can only be 
proved over finite times. This picture was completed by Nekhoroshev in 
the seventies (see [Nek77],[Nek79j and [Nic09 for a recent overview of the 
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theory) who proved the following: if the system is analytic and the unper- 
turbed Hamiltonian h satisfies some quantitative transversality condition 
called steepness, then there exist positive constants a, b, £q, c%, C2 and C3 
depending only on h, such that every solution (8{t),I(t)) of the perturbed 
system starting at time t = satisfies 

\I(t) - /(0)| < Ci e\ \t\ < c 2 exp (c 3 £- a ) , (1) 

provided that the size of the perturbation e is smaller than the threshold 
Eq. The constants a and b are called the stability exponents. If property (JT]) 
is satisfied, we shall say that the integrable Hamiltonian h is exponentially 
stable. Hence, KAM and Nekhoroshev's theory yield different type of sta- 
bility results, but they both ultimately rely on the same tool which is the 
construction of normal forms, and we shall described it below. 

5. The basic idea is to look at a "more integrable" Hamiltonian which yields 
a good approximation of the perturbed system. By the averaging principle 
(see [AKN06J ) , this simpler Hamiltonian is given by the time average of the 
system along the unperturbed flow, that is 

[H) = h+[f], 

where 

w-&(jjf /■>•?"■). 

and is the Hamiltonian flow of the integrable part h. Actually, this 
average depends on the dynamics of the unperturbed Hamiltonian and hence 
on resonant modules associated to frequencies. So given a sub- module M. C 
Z", we define its resonant manifold by 

S M = {I £ R n I k.Vh(I) = for k G M} . 

Due to the ergodic properties of the linear flow with vector V/i(7) over 
the torus T n , the time average over Sm equals the space average along a 
torus of dimension n — m if m is the multiplicity of the resonance (i.e. the 
rank of At), hence n — m angles have been removed in this case. Prom a 
physical point of view, the guiding principle is that rapidly oscillating terms 
discarded in averaging cause only small oscillations which are superimposed 
to the solutions of the averaged system. In order to prove this claim, one 
should check that any solution of the perturbed system remains close to the 
solution of the averaged system with the same initial condition. Especially, 
this will be the case if one finds a canonical transformation e-close to identity 
which conjugates the perturbed Hamiltonian to its average. Hence we are 
reduced to a problem of normal form where one tries to conjugate the system 
to a simpler one, that is we look for a convenient system of coordinates. 
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x(0) = e xk^ k \ f(9) = y 



However, constructing such a good system of coordinates is not an easy 
task. The linearised equation of conjugation reads 

{xM = /-[/], 

if x is t ne function generating the conjugation. This is usually called a 
homological equation and to solve it we need to invert the linear operator 
Lh = {.,h} acting on a suitable space of functions. Here our operator is 
invertible, but its inverse is generally unbounded: this is the small divisors 
phenomenon. To see this, just note that once an action / 6 Sm is fixed 
(and hence a frequency oj = Vh(I) satisfying k.u; ^ for k ^ Ai), the 
homological equation is a just a first-order, linear with constant coefficients 
partial differential equation on T n , namely 

w.Vx = /-[/]• 

Such equations are known to be well-suited for Fourier analysis, in our case 
the operator is easily diagonalized in a Fourier basis and we find that the 
eigenvalues are proportional to the scalar products k.oj, for k £ Z ra . More 
precisely, expanding x and / as 

J2nk.9 
Jk^ 

kai/ 1 kez n 

then 

[/] = E h***"™, 

keM 

and so formally 

A f (i27rk.u)- 1 f k , kiM, fn . 

Xk = \0,k£M. (2) 

The scalar products k.u appearing in the denominators of ([2]) are not zero 
by assumption, but they can be arbitrarily small and this is inevitable for 
large integers k (see the estimate (|3|) below). This can cause the divergence 
of the Fourier series of x an d hence the unboundedness of the inverse of 
Lh- Classical small divisors techniques are concerned with obtaining lower 
bounds for these scalar products to ensure the convergence of the series 
and this leads necessarily to complicated estimates. Furthermore, to ob- 
tain a result applying to all solutions, a partition of the phase space into 
resonant manifolds associated to different modules, usually called the geom- 
etry of resonances, has to be achieved and this is a delicate task. All these 
techniques are very important, in particular to study Arnold diffusion and 
related problems, however we will show that they are not necessary to prove 
Nekhoroshev's estimates. 

6. Indeed, all these problems are completely bypassed if we only average 
along periodic orbits of the unperturbed flow. We first recall the following 
definition. 
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Definition 1.1. A vector u £ W 1 is said to be periodic if there exists a real 
number t > such that tu G 27'. In this case, the number 

T = wf{t > | tu £ Z"} 

is called the period of uj. 

A basic example is given by a vector with rational components, the 
period of which is just the least common multiple of the denominators of 
its components. Geometrically, if u is T-periodic, an invariant torus with a 
linear flow with vector uj is filled with T-periodic orbits. In this case, the 
average along such a periodic solution is given by 

where I denotes the linear Hamiltonian with frequency u, that is 1(1) = uj.I. 
Then the homological equation {x, 1} = f— [/] is easily solved without using 
Fourier expansions and is given by an explicit integral formula 



fj\f-[f])o^ l s sds. 



So in this case, there is no small divisors. To understand more concretely 
the previous sentence, consider a vector uj 6 W 1 and multi-integers k that 
do not resonate with uj (that is k ^ Z"nw i ). Then in general we don't have 
a lower bound on the divisors k.uj that appears in ([2]), and by a theorem of 
Dirichlet one has the upper bound 

min \k.uj\ < — : . (3) 
0<\k\<K l 1 ~ K n - 1 v ; 

In that context, small divisors techniques use Diophantine vectors for which 
\k.uj\ > 7|/c|7 r , with 7>0, t > n — 1 and where | . |i stands for the £ -norm, 
but nevertheless the lower bound deteriorates as \k\x increases, causing extra 
difficulties (which are usually handled by the so-called ultra-violet cut-off). 
However if the vector oj is T-periodic, one simply has \k.u\ > T -1 and the 
lower bound is uniform in \k\\. 

7. Lochak ( |Loc92j . see also |LN92j and |LNN94j for refinements) has 
shown that averaging along the periodic orbits of the integrable Hamil- 
tonian is enough to obtain Nekhoroshev's estimates of stability when the 
unperturbed Hamiltonian is strictly convex (or strictly quasi-convex, that is 
the Hamiltonian is strictly convex when restricted to its energy sub- levels). 
Indeed, using convexity, Lochak obtains open sets around periodic orbits 
over which exponential stability holds. Then, Dirichlet's theorem about si- 
multaneous Diophantine approximation ensures easily that these open sets 
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recover the whole action space and yields the global result, avoiding the 
difficult geometry of resonances. Put it differently, in the convex case one 
only needs dynamical informations near resonances of maximal multiplici- 
ties, which are completely characterized by periodic orbits. 

The goal of this paper is to extend Lochak's approach for a generic set 
of integrable Hamiltonians. To do so, we will have to analyze the dynamics 
in a neighbourhood of suitable resonances of any multiplicities by using only 
successive averagings along periodic orbits together with Dirichlet's theorem, 
and this will lead to exponential estimates of stability for perturbation of a 
generic integrable Hamiltonian, as stated below. 

Theorem 1.2. Consider an arbitrary real analytic integrable Hamiltonian 
h defined on a neighbourhood of a closed ball in R n . Then for almost any 
£ S R n , the integrable Hamiltonian h^(x) = h(I) — £.I is exponentially stable 
with the exponents a = b = 3 _1 (2n)~ 3n . 

This will be a direct consequence of Theorems 12.21 and 12.44 see below 
in section 12.11 This result is not new, see [Nie07], but the novelty here 
is our method of proof, which avoids completely the fundamental problem 
of small divisors and hence all the associated technicalities (non-resonant 
domains, Fourier series, Fourier norm, ultra-violet cut-off and so on). The 
analytic part of our proof of Nekhoroshev's estimates is therefore reduced 
to its bare minimum, it is nothing but a classical one-phase averaging, while 
our geometric part is based on a clever use of Dirichlet's theorem along each 
solution. Applications of our method to other problems will be discussed 
below, in section 12.21 

To conclude this introduction, we point out that the method of averaging 
along periodic orbits has also been used successfully to re-prove recently 
some KAM theorems without small divisors (see [KLDM06J and [KLDM07J), 
even though their techniques are much more complicated. 

2 Statement of results 
2.1 Set-up and results 

1. Let B = Br be the open ball centered at the origin of W 1 of radius R 
with respect to the supremum norm, the domain T> = T n x B will be our 
phase space. To avoid trivial situations, we assume n > 2. Our Hamiltonian 
function H is real-analytic and bounded on T> and it admits a holomorphic 
extension to some complex neighbourhood of T> of the form 

V r:S = {(9,1) G (C n /Z n ) x C n | \1(9)\ < s, d(I,B) < r}, 

with two fixed numbers r > 0, s > 0, and where 1(9) is the imaginary 
part of 9, | . | the supremum norm on C n and d the associated distance 
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on C n . Equivalently, one can start with a Hamiltonian H, denned and 
holomorphic on T> r ^ s and which preserves reality, that is H is real-valued 
for real arguments. Without loss of generality, we may assume that r < 1 
and s < 1. The space of such analytic functions on £> r ,sj equipped with the 
supremum norm | . | r s , is obviously a Banach algebra with respect to the 
multiplication of functions, and we shall denote it by A r , s - 

Our Hamiltonian H G A r)S is assumed to be close to integrable, that is 
of the form 

H(e,i) = h(i) + f(8,i) 

\f\r,s <S«1, 

where h is the integrable part and / a small perturbation. Moreover, the 
derivatives up to order 3 of h are assumed to be bounded by some constant 
M > 1, that is 

\d k h(I)\<M, l<|fc|i<3, IeB, 
where \k\i = \k\\ + ■ ■ ■ + \k n \. 

2. In order to obtain results of exponential stability, we do need to im- 
pose some non-degeneracy condition on the unperturbed Hamiltonian. Let 
G(n, k) be the set of all vector subspaces of W 1 of dimension k. We equip 
M. n with the Euclidean scalar product, || . || stands for the Euclidean norm, 
and given an integer L £ N*, we define G L (n,k) as the subset of G(n,k) 
consisting of those subspaces whose orthogonal complement can be spanned 
by vectors k G 7L n with \k\\ < L. 

Definition 2.1. A function h G C 2 (B) is said to be SDM if there exist 
7 > and r > such that for any L G N* ; any k G {1, . . . , n} and any A G 
G L (n,k), there exists (ei, . . . , ejt) (resp. (fi, ■ ■ ■ , f n -k)), an orthonormal 
basis of A (resp. of A- 1 ), such that the function h^ defined on B by 

h\(a, P) = h (aiei H h a k e k + H h (3 n -kfn-k) , 

satisfies the following: for any (a, j3) G B, 

\\d a h A {a,P)\\ < 7 L- r \\d aa h A {a,P).r]\\ > 7^ T |b?ll 

for any 77 G W 1 \ {0}. 

In other words, for any (a, (3) G B, we have the following alterna- 
tive: either \\d a h A {a, j3)\\ > 'yL~ T or \\d aa h A {a, /3).r/|| > 7L _T ||r7|| for any 
7] G M n \ {0}. This technical definition, which is a slight variation of a 
notion introduced in [Nie07| . is basically a quantitative transversality con- 
dition which is stated in adapted coordinates. It is inspired on the one hand 
by the steepness condition introduced by Nekhoroshev ( |Nek77] ) where one 
has to look at the projection of the gradient map V/i onto affine subspaces, 
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and on the other hand by the quantitative Morse-Sard theory of Yomdin 
( |Yom83] . |YC 04j) where critical or "nearly-critical" points of h have to 
be quantitatively non degenerate. The abbreviation SDM stands for "Si- 
multaneous Diophantine Morse" functions, and we refer to Appendix IB1 for 
more explanations on this condition and some justifications on the latter 
terminology. 

3. The set of SDM functions on B with respect to 7 > and r > will be 
denoted by SDM^(B), and we will also use the notations 

SDM T (B) = \J SDM^(B), SDM(B) = [j SDM T {B). 

7>0 t>0 

The following result states that SDM functions are generic among sufficiently 
smooth functions. 

Theorem 2.2. Let r > 2(n 2 + 1) and h G C 2n+2 (B). Then for Lebesgue 
almost all £ G W 1 , the function h^(I) = h(I) - £.1 belongs to SDM T (B). 

More precisely, there is a good notion of "full measure" in an infinite 
dimensional vector space, which is called prevalence (see |OY05j and [HK10] 
for nice surveys), and the previous theorem immediately gives the following 
result. 

Corollary 2.3. For r > 2(n 2 + 1), SDM T (B) is prevalent in C 2n+2 (B). 

4. Now we can state the main result of the paper. 

Theorem 2.4. Let H as in ((Jj) and assume that the integrable part h belongs 
to SDM^(B) with r > 2 and 7 < 1. Then there exist positive constants a 
and b depending only on n and t, and eq depending only on h, such that if 
£ < £0; f or every initial action 7(0) G -Br/2 the following estimates 

\I(t) - /(0)| < (n + 1)V, \t\ < exp(e- a ), 

hold true. 

More precisely, we can choose the exponents 

a = b = 3- 1 (2(n + l)T)- n , 

and Sq depending on the whole set of parameters n, R, r, s, M, 7 and r, but 
no efforts was made to improved the stability exponents since the optimality 
of the constants involved is not our goal. Actually, this optimality is not 
relevant for generic integrable Hamiltonians. 

Let us add that the only property used on the integrable part h to 
derive these estimates is a specific steepness property, therefore the proof 
is also valid, and in fact simpler, assuming the original steepness condition 
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of Nekhoroshev (see Appendix [B]) . However, note that this is precisely this 
"weaker" genericity assumption that allows new results of stability near 
linearly stable invariant tori (see [Bou09j). 

We emphasized again that this is not the result itself, but the method of 
proof which is new and leads to many improvements as we explain below. 

2.2 Comments and prospects 

To conclude this section we mention other problems for which our method 
should apply, mainly the study of elliptic fixed points, Nekhoroshev's esti- 
mates in lower regularity and finally estimates in large or infinite dimensional 
Hamiltonian systems. In all these topics, the method of periodic averagings 
have already proved to be very useful. 

5. First our analytic arguments are very intrinsic and this is important 
in the study of the stability of elliptic fixed points in Hamiltonian systems. 
Actually, in this case the transformation in action-angle variables (via the 
symplectic polar coordinates) admits singularities which do not allow to 
derive directly stability results from Nekhoroshev's theory. In the convex 
case, this problem has been overcomed independently by Fasso, Guzzo and 
Benettin([FGB98]) and by Niederman ([Nie98]). Both use Cartesian co- 
ordinates, the first study uses the classical approach and adapted Fourier 
expansions while the second one relies on periodic averagings and simulta- 
neous Diophantine approximation. The latter proof was clarified by Poschel 
([P6s99b]). With our approach, we can remove the convexity hypothesis to 
have exponential stability around an elliptic fixed point under a generic as- 
sumption on the non-linear part. Furthermore, assuming a Diophantine con- 
dition on the normal frequency it is well-known since Morbidelli and Giorgilli 
( [MG95] ) that one can even obtain super-exponential stability by combining 
a sufficiently large number of Birkhoff normalizations with Nekhoroshev's 
estimates. Here, with our method generic results of super-exponential sta- 
bility around elliptic fixed points are also available, and similarly around 
invariant Diophantine Lagrangian tori and even isotropic reducible linearly 
stable tori. All this results are contained in [Bou09j. 

6. Furthermore, one should mention that periodic averagings are well- 
suited for non-analytic Hamiltonians and our formalism should also carry 
on in this context. The advantage of periodic averagings is clear already at 
the linear level when solving the homological equation: if the system is of 
finite differentiability, then for a Diophantine frequency vector the solution 
of the homological equation is subjected to a disastrous loss of derivatives 
(larger than the number of degrees of freedom) and one has to use rather 
cumbersome Fourier expansions, while for a periodic frequency vector, this 
loss of derivatives is minimal and one can use a more elegant integral for- 
mula. Hence in finite differentiability, for a convex or generic unperturbed 
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Hamiltonian system, we can expect a proof of stability estimates (with of 
course a polynomial bound on the time of stability) which is both simple (no 
small divisors) and direct (no need to use the result in the analytic case and 
smoothing techniques, which is the usual approach in KAM theory in finite 
differentiability). Note that the analyticity of the studied system is only 
needed for the construction of normal forms up to an exponentially small 
remainder, but our steepness condition is generic for Hamiltonians of finite 
but sufficiently high regularity. Concerning Gevrey regularity, Marco and 
Sauzin ([MS02J) have already proved exponential estimates of stability in 
the convex case and for the C k regularity, polynomial estimates of stability 
are indeed available (see |BoulOj ). Both results use only periodic averagings, 
so with our method they should also hold for a generic integrable Hamilto- 
nian. It can also be noticed that the analytical properties of the expansions 
arising in periodic averagings are accurately known ([Nei84j,[RS96]). 

7. Finally, results of stability for large Hamiltonian systems as a model for 
statistical mechanics have been obtained by Bambusi and Giorgilli ([BG93]) 
and Bourgain ( [Bou04j ) , and for non-linear evolution PDE seen as an infinite 
dimensional Hamiltonian system mostly by Bambusi ( [Bam99] , [BN02J ) and 
then clarified by Poschel ([P6s99aJ). All these works use Lochak's approach 
in the convex case. We believe that our method should allow to remove the 
convexity assumption in those results to obtain more general statements. 

8. The paper is organized as follows. In the next section, we state our 
normal form and explain the main ideas, and then we give the proof of The- 
or em YTM The complete proof of the normal form is deferred to Appendix lAl 
and in Appendix [B] we collect the basic properties of SDM functions that 
we shall need and we prove Theorem 12.21 and Corollary 12.31 

9. In the text, we shall adopt the following notation taken from [P6s99bJ: 
we will write u <• v if there exists a constant C > 1 such that u < Cv, where 
C depends only on n, R, r, s, M, but not on r and on the small parameters 
£ and 7. Similarly, we will use the notations u •< v, u = ■ v and u ■ = v. 

We shall use the following norms for vectors v £ R n or v £ C n : | . | 
will be the supremum norm, | . |i the £ 1 -norm and | . | the Euclidean (or 
Hermitian) norm. 

3 Proof of Theorem 12.4 

In this section, we consider the Hamiltonian Q, that is 

H(6,I) = h(I) + f(8,I) 
\f\r,s < e 

with H £ Ars- As usual, the proof of exponential stability estimates splits 
into an analytic part and a geometric part. 
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The analytic part is contained in section 1370 It consists in the construc- 
tion normal forms on a neighbourhood of specific resonances, that is suitable 
coordinates which display the relevant part of the perturbation on such a 
neighbourhood. Basically, we will reduce the perturbation to a so-called 
resonant term which is dynamically significant, and a general term which 
will only cause exponentially small deviations. 

The geometric part is expanded in section 13. 2\ and it is mainly based on 
the properties of the underlying integrable system. The strategy will be first 
to defined a class of solutions, which we call restrained, and for which it is 
obvious from our normal forms that they are stable for an exponentially long 
time. Using this intermediate result, we will then show that all solutions 
are in fact exponentially stable, and our main tools to do this will be an 
adapted steepness property satisfied by our integrable system, as well as a 
basic theorem of Dirichlet on simultaneous Diophantine approximation. 

3.1 Analytical part 

1. Let us begin by describing the neighbourhoods of resonances we will con- 
sider. Given a sequence of linearly independent periodic vectors (u?i 3 . . . , w„), 
with periods (T±, . . . , T n ), we define in the complex phase space, for j £ 
{1, . . . , n}, the domains 

V r ., s .{ujj) = {{6,1) G V rjyS . | \Vh{I)-Uj\ < rj }, 

with two sequences (n, . . . , r n ) and (si, . . . , s n ). 

Remark 3.1. It is important to note that there is an implicit constant in the 
previous definition represented by the dot, and we will not make it explicit 
in order to avoid cumbersome and meaningless expressions. We just men- 
tion that it depends only on n, M and j G {1, . . . ,n} and for subsequent 
arguments it has to be chosen sufficiently large. 

Informally, one has to view the domain T> rjiSj (ujj) as a neighbourhood, 
in frequency space, of a periodic torus with a linear flow of frequency Uj. 
Such domains will therefore be called nearly-periodic tori. We will also use 
the real part of those domains, which are T n x B r (ujj) where 

B rj (uj) = {l€ B rj | \Vh(I) - Uj \ <■ rj}, 

with B r . = {I G R n | d(I,B) < r,-}. 

2. Given an analytic function / defined on T> r . jS .(ujj), we simply denote its 
supremum norm by 

l/|fj,Sj = \f\T> rpaj (u3j)- 

For vector-valued functions, this definition is extended component-wise, that 
is 

{deflrj^ = max \d e .f\ r s |<9j/| r = max \d h f\ r s 

Ki<n Ki<n 
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We will write L for the linear integrable Hamiltonian with frequency ojj, 
that is = ojj.I for j G {1, .. . ,n}. For any function /, we will denote 
[f]j its average along the periodic flow generated by lj, that is 

[f]j = ^ fo^Jds. 
L i Jo 

3. Our interest here is to obtain normal forms on nearly-periodic tori up to 
an exponentially small remainder with respect to some parameter m G N, 
that we will choose later of order e~ l (during the proof of Theorem I2.4|) . 
To this end, we will need the following conditions (Aj), for j G {1, ... ,n}, 
where (^4i) is 

mT\E -<r\, mT\r\ •< si, 0<ri<-si, 
B ri (wi) / 0, ri -<r, si -<s, 

and for j G {2, . . . , n}, (A,) is 

mTj-e •< nrj, mT jrj ■< Sj , < <• s i9 

Let us explain briefly our assumptions. 

First, the condition on the inclusion of nearly-periodic tori is really 
crucial. Indeed, since lu% is periodic, the nearly-periodic torus P riiSl (wi) 
describes a neighbourhood of a resonance of multiplicity n — 1. Now for 
j G {2, . . . ,n}, since (wi, . . . ,u)j) are periodic and independent, the inclu- 
sion assumption, together with the non triviality assumption, imply that 
the nearly-periodic torus T> rj jSj (ujj ) also describes a neighbourhood of a res- 
onance, but of multiplicity n — j. Note that such a condition will put an 
important restriction on our choice of the sequence (wi, . . . , co n ) as they will 
have to be sufficiently close to each other to ensure these inclusions. 

Then, the condition on our parameter m G N, 

mTjrj -<Sj, je{l,...,n}, 

is also important as it will later determine m in terms of e and hence the 
precise size of the exponentially small term. 

Finally, the other conditions are only technical (and will be easily ar- 
ranged in the sequel), as they only give smallness conditions on e. 

4. Our normal form is described in the next proposition. 

Proposition 3.2. Consider H = h + / as in d*]) and let j G {1, . . . , n}. If 

(Ai) is satisfied for any i G {1, . . . ,j}, then there exists an analytic symplec- 
tic transformation 

*J : \/3,2 Sj /3H) -> ZVi,«i( w l) 
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such that 



HoVj = h + gj + fj, 
with {gj, l{\ = for i S {1, . . . , j} and the estimates 

\9e 9 jhrj /3,2s j/3 < £ i \9e fjhrj /3,2s j/3 < e~ m e. 
Moreover, we have *$>j = $i o ■ • • o $j with 

$i : ^2 n /3,2 Si /3 (w») -> 



suc/i i/iai |$j -Id| 2r ./3,2a i /3-<'"<> / ori £ {1,... ,j}. 

The proof of Proposition EO goes by induction, it is not difficult but quite 
long, and so it is deferred to Appendix [A] Here we will try to give a sketch 
in the case j = 2, explaining the main ideas without any technicalities. 

The first step is to prove the case j = 1, that is to find a transformation 

such that H o = h + g\ + f\ with {<?i,/i} = and f\ exponentially 
small with m. This is very classical. First observe that we can write our 
original Hamiltonian as H = h + g° + / , where g° = trivially satisfies 

^1} = and f° = f is order e. Now it is easy to produce a transformation 
ip° such that H o ip° = h + g 1 + f 1 , with {g 1 , /i} = 0, but thanks to our 
assumption (Ai) the remainder f 1 can be made smaller, of order e _1 e: this 
is an averaging process, g± = [f°]i and the remainder is estimated by Cauchy 
inequality. Now we only have to iterate this process m times, and writing 
$1 = $i = tp° o . . . (p™' 1 , gi = g m and fx = f m , we end up with H o \P X = 
h + gi + /i with the required properties. 

For the second step, we use the first one and consider Ho^i = h+gi+fi 
which, by our assumption on the inclusion of domains (this is part of (^2)), 
is also defined on V r2 ^ 2 (u)2)- We can forget for a moment about f\ which 
is already exponentially small and consider g\ as the new perturbation. 
Now as in the first step, we can construct a transformation <3?2 such that 
(h + gi) o <3> 2 = h + g2 + / 2 with {52, h} = and f 2 is exponentially small: 
we start with h + g\ = h + 5° + /1 , where = 0, /° = 51 and we find if 1 
such that (/i + g{) o 99 1 = /i + g\ + /-j 1 where g\ = [fi\2- After m iterations 
we finally have 52 = 9™ and / 2 = /-J 71 . Assuming we still have {527 ^1} = 0, 
the conclusion follows: let ^2 = ^1 &2 = $1 ^2 and f2 = f 2 + fi° &2, 
then Ho^2 = h + 92 + f2 has the desired properties. 

So it remains to explain why {<?2>^i} = 0. The key observation is the 
following: if g\ satisfies {gi,h} = 0, then 



[9lh = T 2 J 



gi o § l s 2 ds 



and 



1 

X = 7F- 
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also satisfy {[<7i]2 5 ^1} = and {x, h} = 0. In Appendix lA] this will be done 
by direct computations, but this is in fact a more general phenomenon in 
normal form theory and it is not restricted to the situation we consider here. 
Indeed, since {h,fa} = 0, the linear operators L\ x = and L; 2 = {.,^2} 

commutes, so that the kernel of L[ t is invariant by L; 2 , and as L/ 2 is semi- 
simple, it is also invariant under the projection onto the kernel of Li 2 which 
is given by the map [.]2- This explains why {[51)2, '1} = 0. Now g\ — [gi\2 
is in the kernel of L\ x , and its unique pre-image by L\ 2 is given by x, hence 
Wi} = 0. 

Remark 3.3. Note that this property was actually used by Bambusi (' L Bam99], 
Lemma 8.4)- 

5. Let us now examine the dynamical consequences of our normal form. 
As usual, it will be used to control the directions, if any, in which the action 
variables in these new coordinates can actually drift, and we shall come back 
to our original coordinates at the beginning of section 13.21 

Under the assumptions of Proposition 13.2^ consider the Hamiltonian 

Hj = H o^j = h + gj + fj 

on the domain ^2^73,28^/3 Let M.j be the Z- module 

Mj = {keZ n \ k.Ui = 0, i £{!,... J}}, 

whose rank is n — j, and Aj = Aij ® M the vector space spanned by Mj- 

The following lemma is completely obvious using the definition of the 
Poisson bracket. 

Lemma 3.4. The equality {gj,h} = 0, for all i € {1, . . . ,j}, is equivalent 
to d e gj £ Aj . 

Now consider a solution P(t)) of Hj with an initial action P(tj) G 

B2 rj /s(ujj) for some tj £ M, and define the time of escape of this solution 
as the smallest time tj +00] for which ^ £> 2r /3 (wj ) • The only 

information we shall use from our normal form is contained in the next 
proposition. 

Proposition 3.5. Let Hj be the projection onto the linear subspace Aj, then 
with the previous notations, we have 

\P{t) - P^) - Uj(P(t) - P(tj))\ <e, t € [tj, e m [n[tj, tj[. 

In particular, 

\I n (t)-I n (t n )\<e, te[t n ,e m [. 
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Proof. Let ILy be the projection onto the orthogonal complement of Aj, so 
that Hj + Hj is the identity and therefore 

\P{t) - P(t 3 ) - U 3 (P(t) - P(t 3 ))\ = \Uf(P(t) - P(t 3 ))\. 

Now, as long as t < tj, the equations of motion for Hj = h + gj + fj and the 
mean value theorem give 

\P(t) - P(tj)\ <\t- t 3 \\d e (g 3 + fj^/s^/s- 

But {gj,h} = for i G {1, . . . ,j}, so by Lemma [3.41 we have dggj £ Aj, 
hence if we first project the equations onto the orthogonal complement of 
Aj we have 

\uf(P(t) - P(t 3 ))\ <\t- tj\\d e fj\ 2rj/ 3 >2sj/ z. 

Now since \t — tj\ < e m and \dgfj 12^/3,2^-/3 <-e _m e, the previous estimate 
gives 

\Uf(P(t)-P(t 3 ))\<e, 

and therefore 

\P(t)-P(t 3 )-n 3 (P(t)-P(t 3 ))\<e 

for t G [tj,e m [n[tj,tj[. 

Finally, note that U n is identically zero, so that the mean value theorem 
immediately gives t n > e m and the estimate 

\r(t)-I n (t n )\<e, te[t n ,e m [, 

follows easily. This concludes the proof. □ 

The interpretation of the above proposition is the following: if Aj is the 
affine subspace passing through P (tj ) with direction space Aj , then as long 
as P (t) remains in the domain B rj (ujj), it is e-close to Aj for an exponentially 
long time with respect to m. This means that for that interval of time, there 
is almost no variation of the action components in the direction transversal 
to Aj, so that any potential drift has to occur along that space. 

3.2 Geometric part. 

In this section we will give the proof of Theorem 12.41 using the method intro- 
duced by Niederman in |Nie04| and |Nie07| . Without loss of generality, we 
will consider only solutions (6(t),I(t)) starting at time to = and evolving 
in positive time t > 0. We will first show that some specific solutions are 
exponentially stable, but to define them we shall need some extra notations. 

6 . Consider a sequence of linearly independent periodic vectors (uj\ , . . . , w„) , 
with periods (Ti,...,T n ), and two decreasing sequences of real numbers 
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(n, . . . , r n ) and (si, . . . , s n ) satisfying conditions (A/), for j G {1, . . . , n}. 
Recall that from Proposition 13.21 we have a transformation 

*j : ^2r j /3,2a_ j /3(^j) ( w l ) > G i 1 ' • • • > n }' 

such that = $1 o • • • o where 

$i : £>2ri/3,2 Si /3(Wi) ->■ ^r i)Si (Wi), i € {1, . . . , j}, 

satisfies the estimate — Id| 2r ./3 j2s -/3 •< r; L . 

By construction, our transformations preserve reality so that 

$i : T n x B^/aCwi) -> T n x B ri ( Wi ), » g {1, . . . , j}, 

with |$j — Id| 2ri/3 "< r i- I n particular, arranging the implicit constant in the 
previous estimate ensure that the image of B 2 ri/3( w i) under <I>j contains the 
smaller domain B r .j^(u>i). From now on, we shall simply write 

Bi = B n / 3 (ui), ie{l,...,n}, 

and for completeness B$ = B. 

Given a solution (0(t),I(t)) G B starting at time to = 0, we can define 
inductively the "averaged" solution (0 l (t),P(t)) for i € {1, . . . , n} by 

<z> l (o l (t),r(t)) = (9*- 1 (t),r- 1 (t)) 

as long as P" 1 ^) G B i} with (0°(i), J°(t)) = (9(t),I(t)). Moreover, using 
our estimate on <E>j we have 

II^)-.^*)! ■<!•<, iG{l,...,n}, (4) 

during that time interval. 

7. We can finally make our definition. 

Definition 3.6. Given ro > and m G N, a solution (9(t),I(t)) of the 
Hamiltonian fl*J^ starting at time to = 0, is said to be restrained (by ro, 
to time e m ) if we can find sequences of: 

(1) radii (ri, . . . ,r n ), with < r n < • • • < t\ < ro; 

(%) widths (si, . . . , s n ), wrtt/i < s n < • • • < si; 

f5j independent periodic vectors (lji, . . . lozt/i periods (T%, . . . ,T n ); 

(4) times (ii, . . . , i n ) ; wt/i = to < ti < • • • < £n < *n+i = e m , 

satisfying, for j G {0, ... , n — 1}, conditions {Aj + \) and the following con- 
ditions (Bj) defined by 



UP(t)-P(tj)\ < rj , te[tj,t j+1 }, 
\\Vh(P(t j+1 ))-u j+1 \ < r j+1 . 
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Before explaining this definition, we need to make several remarks. First, 
for j £ {0, . .. ,n — 2} we will see that the first condition of (Bj + \) is well 
defined by the second condition of (Bj). Furthermore, for j £ {0, . . . , n — 1} 
the last condition in (Bj) implies in particular that the set Bj+i(ujj + i) is 
non-empty so we may remove this assumption from (Aj + \). Finally, we can 
choose the same sequence of widths (si, • • • , s n ) for ^11 solutions, therefore 
we may already fix s, • = s with a suitable constant and this simplifies some 
conditions (for instance, the condition mTjVj •< Sj appearing in (Aj) will be 
replaced by rnTjrj •< 1). 

We have chosen the word "restrained" because for such a solution the 
actions I(t) (or some properly normalized actions P(t)) are forced to pass 
close to a resonance at the time t = tj, the multiplicity of which decreases 
as j increases, and moreover the variation of these (normalized) actions is 
controlled on each time interval [tj, tj+i]- Hence after the time t n , the actions 
are in a domain free of resonances and they are easily confined in view of the 
last part of Proposition 13.51 This is reminiscent of the original mechanism 
of Nekhoroshev, but the fact that we consider each solution individually will 
greatly simplify this geometric part. 

8. Let us see how the actions of a restrained solution are easily confined 
for an exponentially long time with respect to m. We shall write 

Pj =r 1 + --- + r j , 

for j G {l,...,n}. 

Proposition 3.7. Consider a restrained solution (9(t), I(t)), with an initial 
action 1(0) £ B R j 2 - If 

(i) £-<r n ; 

(ii) r -<R, 

then the estimates 

\I(t) - I(0)| < (n+l) 2 r , 0<t<e m , 

hold true. 

Proof. First observe that for each j E {1, ... ,n — 1}, for t S [tj,tj + \] we 
have 

\I(t) - I( tj )\ < \I(t) - P(t)\ + \P(t) - P{t 3 )\ + \P( tj ) - I(tj)\, 
so the first part of (Bj) and §S§ yields 

\I(t)-I(tj)\ <2 Pj + rj (5) 
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while for t G [0, ti], the first part of (Bq) reads 



\I(t)-I(0)\ <r . (6) 

Now let t G [0,e m ], then i G [*j>*j+l] f° r some j G {0, ...,n} (recall that 
t n+ \ = e m ), and we will distinguish three cases. 

First assume that t G [0,ti], in this case the conclusion follows by ([6]) 
since (n + l) 2 > 1. Now assume t G [tj,tj + i] for some j G {1, . . . ,n — 1}, 
then we can write 

j-i 

\i( t ) - /(o)i < \i(t) - i(tj)\ + J2 m+i) - m)\, 

i=0 

and by © and © 



- I(0)| < ^(2p 4 + n ) + r < (n + r 



2 r , 



i=i 



since r, < tq for i G {1, . . . , j}. Finally, assume that t G [t n ,i n+ i], then we 
can apply the second inequality of Proposition 13.51 and (i) to estimate 

\I n (t)-I n (t n )\<e<r n , 

and so 

\I(t)-I{t n )\ < 2p n + r n 

which gives 

n 

\I(t) - I(0)| < Y,( 2 Pi + r i) +r <(n+ l) 2 r . 
i=l 

To conclude, just note 1(0) G -Br/2 an d (H) ensures that I(t) remains in Br 
for t < e m . □ 



9. Restrained solutions are exponentially stable, and now we will show 
that this is in fact true for all solutions. However, to use our steepness 
arguments this will be done quite indirectly, and so it is useful to introduce 
the following definition. 

Definition 3.8. Given tq > and m G N, a solution (9(t),I(t)) of the 
Hamiltonian d*j) ; starting at time to = 0, is said to be drifting (to tq, before 
time e m ) if there exists a time t* satisfying 

-I(0)| = (n + l) 2 r , 0<t,<e m . 
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Of course, this definition makes sense only if (n + l) 2 r < R/2. In view 
of Proposition 13.71 drifting solutions cannot be restrained. However, we will 
prove below that if such a drifting solution exists, it has to be restrained 
under some assumptions on ro, m and e, which will eventually prove that 
all solutions are in fact exponentially stable. 

More precisely, assuming the existence of a drifting solution, we will 
construct a sequence of radii (r%, . . . , r n ), an increasing sequence of times 
(ti, . . . , t n ) and a sequence of linearly independent vectors . . . , u) n ), with 
periods . . . ,T n ) satisfying, for j G {0, . . . ,n — 1}, assumptions (-Aj+i) 
and (Bj). All sequences will be built inductively, and we first describe the 
tools that we shall need. 

10. For j G {1, . . . , n}, recall that Aj is the vector space spanned by 

Mj = {k£Z n \ k.coi = 0, i G {1, . . . , j}}, 

and that IL, (resp. 11+) is the projection onto Aj (resp. Aj-). Let us define 
the integer 

Lj= sup {|r iWi |}€N*, j€{l,...,n}. 

ie{l,...,j} 

For completeness, we set Ao = M n , Lq = 1 and in this case LTo is nothing 
but the identity. To construct the sequence of times, we will rely on the 
fact that our integrable part h belongs to SDM^(B), so that it satisfies the 
following steepness property (see Appendix |B|) . 

Lemma 3.9. For j € {0, . . . ,n — 1}, let Aj be any affine subspace with 
direction Aj, and take r < 1. Then for any continuous curve V : [0,1] — > 
Aj n B with length 

|r(o)-r(i)| =r-< 1 L~\ 

there exists a time t* G [0, 1] such that 

r(t) -r(o)| < r, t€[o,u], 

UjiVhiTiUM ->r 2 . 

Proof. For any j G {1, ...,n— 1}, the orthogonal complement of Aj is 
spanned by u±, . . . , u>j, hence by the integer vectors T\w\, . . . , TjUj, so that 
Aj belongs to G Lj (n,n — j) with the integer Lj defined above. Therefore one 
can apply the Proposition IB.2I in Appendix [B] to get the required properties 
(note that here we are using the supremum norm instead of the Euclidean 
norm, so the implicit constants are different). 

For j = 0, r : [0, 1] — > B = BDM. n , but since the orthogonal complement 
of W 1 is trivial one can take Lq = 1 . □ 

11. To construct the sequence of periodic vectors, we shall use the follow- 
ing lemma, which is a straightforward application of Dirichlet's theorem on 
simultaneous Diophantine approximation (see |Cas57j ). 
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Lemma 3.10. Given any vector v G M. n and any real number Q > 0, there 
exists a J '-periodic vector uj satisfying 

\v-uj\ < T~ x Q~£z, \v\~ l <T <Q\v\~ l . 

Proof. Fix any real number Q > 0. We can write the vector v, up to re- 
ordering its components, as v = \v\(±l,x) with x G ]R n , and it will be 
enough to approximate x by a periodic vector. By a theorem of Dirichlet, 
we can find an integer q, with 1 < q < Q, such that 

i 

\qx — p\ < Q "- 1 , 

for some p G Z™ -1 . The vector q~ x p is trivially g-periodic, hence the vector 
uj = \v\(±l, q~ x p) is T-periodic, with T = 1 | 1 , therefore 

\v\~ l <T< Ql^l" 1 , 

and we have the estimate 

\v — uj\ < T~ x \qx — p\ < T~ 1 Q~~^ r[ . 

□ 

12. Now we can finally prove that drifting solutions are in fact restrained 
under some assumptions. This will be done inductively, and for technical 
reasons we separate the first step (Proposition 13.11]) from the general induc- 
tive step (Proposition 13.12]) . 

Proposition 3.11. Let (9(t),I(t)) be a drifting solution. IfrQ-<j, then 
there exist a time t\, a T\-periodic vector uj\ and n=-T^ e ai for some 
constant a\, satisfying {Bq). Moreover, we have the estimates 

1 <• 1\ <• e- 01 ^- 1 )^ 2 , 1 < L x <• e- 01 *"- 1 ^ 2 . (7) 
Proof. We need to construct t%, uj% and n satisfying 

(o) \I{t)- J(0)| <r , te[o,h]i 

(b) |V/»(J(ti))-wi|<n, 
and the estimate ([7]). Consider the curve 

Ti : t G [0,U] *—>I(t) GflC M n . 

Since we have a drifting solution, we can select £j$ £ [0) **] such that 

|ri(tS)-ri(o)|=r . 



20 



Now using the fact that h £ SDM^(B) and ro •< 7 (recall that Lq = 1), we 
can apply Lemma [3791 (the case j = 0) to the curve Ti restricted to [0, t^] to 
find a time t\ £ [0, ig] f° r which 

f|/(t) - /(0)| <r , t€[0,*i], 
\|Vfc(J(ti))|->rg. 

The first inequality of (jHJ) gives (a). 

Now choose Qi = £ -a i("~ 1 ) ) for some constant a\ yet to be chosen, and 
apply Lemma 13,101 to approximate V/t(/(ti)) by a Ti-periodic vector wi, 
that is 

\Vh(I(h)) - Wl | < T^Ql^ 1 = Tf V 1 . (9) 

Moreover, since 

r 2 <\Vh(I(t 1 ))\<l, 
the period T\ satisfies the following estimate 

KT 1 <e-° 1 ( n - 1 Vo 2 . (10) 

Now choose r 1 =-T^ 1 e ai so that Q gives (b). Finally, as L\ = |TiWi| and 

|wi| < \Vh(I(h))\ + \Vh(I(ti)) - u>i\ <• 1 

we obtain 

1 < £1 <• e- 0l(n-1 Vo 2 (11) 

where the lower bound follows from the fact that T\uj\ is a non-zero integer 
vector. The estimates (fTUj) and (fTTj) give (0). □ 

Proposition 3.12. Lei (9(t),I(t)) be a drifting solution, j £ {l,...,n — 
1} and assume that there exist sequences (ii, . . . , tj), (coi, . . . , ujj) linearly 
independent and (ri, . . . , rj), satisfying assumptions (Ai) and (B^i), for 
i £ {1, . . . , j}. Assume also that 

(i) rj •< min{r, s}; 

(ii) mTjE -<r\rj ; 

(Hi) mTjrj •< 1 ; 

(v) ('/• /,/., ; ) ; <r,-; 

2- 



(vi) s < [TjrjLj 1 
(vii) r\ ■< Tq. 
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Then there exist a timet aTj + \-periodic vector ooj+i andrj+i =-TJ +1 e a i + 
for some constant a^+i, satisfying (Aj+i) and (Bj). Moreover, we have the 
estimates 

l<T j+1 <e- 1 ^ n ~ 1 \^ 2 , 1 <L j+1 < max {e^"- 1 )}^ 2 , (12) 

*e{i,...,i+i} 

and if 

( -i\ 2r 
(viii) rj+i < [TjrjLj j , 

then ojj+i is linearly independent of (oo\, . . . ,u)j). 

Proof. First note that for j = 1, we do not require that t±, w% and r% 
satisfy (A\) since this is implied by the conditions and (Hi), and 

for j > 1, the same conditions reduce assumption (Aj + \) to the inclusion 
of real domains B Tj+1 (uij + i) C B 2rj /3(u>j) (recall that by condition (-By-i) 
these domains are non-empty, and that we have already fixed Sj - = s). 
Therefore, we need to construct ij+i, Wj+i and r^+i satisfying 

(a) \P(t)-P(tj)\ <r j} te[tj,t j+1 ]; 

(b) \Vh(P(t J+1 ))-u j+1 \<r J+1 ; 

(c) Wj+i is independent of (u?i, . . . , ujj); 

(d) B rj+1 (u) j+1 ) C B 2rj /3(uj), 

and the estimates (fT2j) . 

Let tj be the maximal time of existence within Bj of the solution P (t) 
starting at P(tj). Since (Aj) is satisfied, we can apply Proposition 13.51 and 

for t G [tj,tj] fl [tj, e m ], we have 

|P(t)-7^(t i )-n j (7^t)-P(t i ))|<. £ . (13) 
Now consider the curve 

r j+ i : t e [tjjj] n [t j} e m ] .— >• i j (tj) + Uj(P(t) - P(tj)) g \j n b, 

where Aj is the affine subspace P(tj) + Aj. 

Claim: there exists a time tj £ [tj,tj] fl [tj,e m ] such that 

\r j+1 (t*) - r j+1 (tj)\ = \Uj(p(t*) - P(t 3 ))\ = (TjTjLfy. 

Let us prove the claim. We have to distinguish two cases. 
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First case: tj < e m . We have 

\Vh(P(tj)) -ujl < \Vh(P{tj))- Vfc(/ i_1 (*i))l + IVhiP- 1 ^))-^], 
and therefore 

\Vh(I j (tj)) -Uj\ <r j} 

while by definition, 

\Vh{P{i j ))-u j \=-r j 
with a sufficiently larger implicit constant (see Remark 13, ip . Hence 

\vh{P{ij))-vh(p(t 3 ))\>-r v 

and this implies 

\P(t 3 )-P( t] )\>.r r (14) 
But conditions (v) and (vi) give in particular 

e-< rj , 

so that (JT3J) and JUJ) yields 

\Uj{P{tj)-P{tj))\>-rj. 

Now using (u) again, this gives 

in^'ct,-) -/%))|>- (T j r i LTl) T , 
and so we can certainly find a time tj G [tj,tj] such that 

\Uj(P(t*)-P(tj))\ = (TjrjLf) T . 

Second case: tj > e m . We will first prove that i* G [tj,e rn ]. Indeed, 
otherwise t* belongs to [ifc 3 ifc+i] for some fc G {0, — 1} and we can 
write 

fc-l 

- /(0)| < |J(t,) - /(tft)! + IHU+l) ~ m)\- (15) 

i=0 

Each term of the right-hand side of (|15p is easily estimated: using (Uj) for 
i G {0, . . . , k — 1} we have 

irfo+i) - < n, \i k {u) - i k (tk)\ < n, 

which implies, by the triangle inequality and the estimate (HJ) 

\I(t i+ x) - I(ti)\ < 2 Pi + r t , \I(U) - I(t k )\ < 2p k + r k . 
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Moreover, 

!/(*!)- 1(0)1 <r , 

hence we find 

k 

\I{U) - /(0)| < Y,( 2 Pi + r *) + r <(n+ l) 2 r , 
i=i 

which of course contradicts the definition of our drifting time i* . 

Now to prove the claim, we argue by contradiction and suppose that 

|n,-0P(t) -P(tj))\ < (t^l-'Y, t G [ tj ,e m ]. 



Since G [ij,e m ] 3 we can use the previous inequality together with the 
estimate (|13p and both conditions (v) and (in) to first obtain 

\P(U)-P(t 3 )\<r J , 

and then with the triangle inequality 

\I{U) -I(tj)\ < 2 Pj + rj . 

Now, as the argument above, writing 

3-1 

- I(0)| < - I( tj )\ + £ |7(i i+1 ) - 7(i 4 )| 

we find the same contradiction on the time t* , which completes the proof of 
the claim. 

Now consider the restriction of the curve r\,-_|_i on the interval [tj, £!■]. Us- 
ing our claim together with conditions (iv) and (v), we can apply Lemma [3 .91 
to find a time tj + \ G such that 

\n j (P(t)-ii(t j ))\<(T j r j Lj 1 Y, te[tj,t j+1 ], 

/ A 2r (16) 

\U,(Vh(T H1 (t J+1 )))\ ■> (T^LJ 1 ) . 

The first inequality of (|16p . together with (|13j) and conditions (v) and (vi) 
give 

\P{t)-P{t 3 )\<T 

for i G hence (a) is verified. Now as in the first step, choose 

Qj+i = e - ^^ 1 ^™ -1 ) for some constant dj+i to be chosen later, and apply 
Lemma f3.10l to approximate Vh(P (i/+i)) by a T J+ i-periodic vector Wj+i, 
that is 

\Vh(P(t J+1 )) - Uj+1 \ < T7 + \Q-^ = Tr^+K (17) 
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Let r i+ i=-rr ) 1 1 £^+ 1 so that (b) is verified by (fT7|) , To estimate the period 
Tj + \ and the number Lj, we need a lower bound for \Vh(P (tj+i))\ and we 
will use the fact that we have such a lower bound for |V/i(I(ii)| (see the 
second inequality of ©). First note that one has easily 

\P(t j+1 )-I(tx)\<ri, 

since n < T\ for i £ {1, . . . , j}, and therefore 

|V^(t j + 1 ))-V/l(/(tl))|<T 1 , 

so choosing properly the constant in the condition (vii) we can ensure that 
\Vh(P(t J+1 ))-Vh(I( tl ))\.<r 2 

and hence 

\Vh(P(t J+1 ))\ > \Vh(I( tl )\ - \Vh(P(t J+1 )) - Vh(I(h))\ •> rl (18) 
By Lemma 13.101 this gives the estimate 

1 <• T j+1 <• £ -%+i(«-i) r - 2 . (19) 
Now as \ujj \ <• 1 this easily implies that 

l<L i+ i< max {e^"" 1 )}^ 2 . (20) 
ie{i,...j+i} 

The estimates JTHJ) and (gDJ) give (fT2l . 

Next having built rj + \, we need to check that u/7+1 is independent of 
(cji, . . . , Wj). First, by using the mean value theorem, the estimate (fT3j) and 
our condition (vi), we have 

|Vfc(/%+i)) - Vh(T j+1 (t j+1 ))\ ■< (t t 3 L-^ T , 
and together with the second estimate of (|16p . this gives 

|n i (v/ i (P(i i+1 )))|-> (/)o/ v ; ) 2; . (21) 

Furthermore, using (|17p 

|IL;(V/i(P(i i+ i)) - w i+ i)| < |Vft(/ J (t j+ i)) - -<r i+ i 
hence with (viii), we get 

iHj-CVM^Cti+i)) - •< (TjTjLfY" . (22) 
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Now by the estimates (J2TJ) and (J22J) 

1^(^+1)1 > |n J (v/ i (P(t, +1 )))|-|n,(v^'(t J+1 ))-^ +1 )|-> (t^lt 1 )* 

and so ^(wj+i) is non zero, which means that oJj+i is not a linear combi- 
nation of {uji, . . . ,u)j}. This proves (c). 
Finally we can write 

\C0 j+1 -LOj\ < - Vh(P{t 3+l ))\ + \Vh(P{t 3+l )) - Vh{P{tj))\ 

+ \Vh(P(tj)) - Vh{p-\t ))\ + \VhiP-\tj)) - 

and hence 

\u j+1 - Uj\ <■ (rj + r j+ i) <• rj. 
So given any I 6 B r . +1 (wj+i), we have 

|V/i(I) — ct7j-| < |V/i(I) — + l^j+i — Wj| <• rj, 

so that 7 E S2 rj /3(wj), which gives (d). This ends the proof. □ 

13. Now we can eventually complete the proof of the main Theorem 12.41 

Proof of Theorem \2.4\ As a consequence of Propositions 13.71 13.111 and 13.121 
we know that 

\I(t) - I(0)| < (n+l) 2 r , 0<t<e m 

provided that the parameters ro, rra and e satisfy the following eleven con- 
ditions: 

00 r i+i •< ( T i r i L 7 1 ) , i ^ {1, . . . , tt, — 1}; 

(ii) e < (T j T j Lj i y r ,je{l,...,n-l}', 

(in) (TjrjLj 1 ^ ■< r j} for j G {1, . . . , n - 1}; 

(iv) rnTjrj •< 1, for j € {1, . . . , n}; 

(v) ri -<rg; 

(i>i) mTj£ •< firj, for j E {1, . . . , n}; 

(wm) £-<r n ; 

W {^ l 'j r i L j ') < 77J T , for j € {1, . . . , ra - 1}; 

(is) r -<7; 
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(x) r -<R; 

(xi) Tj ■< min{r, s}, for j G {1, . . . , n}, 

where rj = ■ T~ e a i , with aj to be defined for j G {1, . . . , n}, and 

KT j <e- a ^ n - 1) rQ 2 , 1 < Lj <• max {e^"- 1 )}^ 2 . (23) 

ie{i,...j'} 

So let us choose m - = e~ a and ro = e b , for two constants a and 6 also to be 
determined. 

Using the estimates (|23|) on the periods Tj, j G {1, . . . , n} and the num- 
bers Lj, j € {1, . . . ,n — 1}, as well as the form of ro, Tj for j G {1, . . . , n} 
and m, one can see that conditions (i) to (xi) are implied by the following 
conditions: 



(i 
(ii 

(Hi 

(iv 

(v 

(vi 

(vii 

(viii 

(ix 

(x 

(xi 



a j+1 - 2nr (maxj 6{1 ,...j}{<h}) ~ 4r& > 0, j G {1, . . . , n - 1}; 
1 - 2nTa,j - 4r6 > 0, j G {1, . . . , n - 1}; 
(r - l)a.j — 26 > 0, for j G {1, . . . ,n — 1}; 
aj > a, for j G {1, . . . , n}; 
oi - 26 > 0; 

1 — a — (2n — l)aj — nai — 66 > 0, for j G {1, . . . , n}; 
1 — na n — 26 > 0; 

e < 7( ro i) _1 J for j G {1, ... ,n - 1}; 
e < R^ 1 ; 

e < (min{r, s })( na J+ 2fe ) _1 , for j G {1, . . . ,n}. 



So we need to choose constants aj, j G {1, ... ,n}, a and 6 such that the 
previous conditions are satisfied. First note that by (i'), the sequence aj, 
for j G {1, . . . , n}, has to be increasing, hence 

max {ai} = aj, j e{l,...,n}. 
ie{l,...,j} 

Then using (v'), we observe that (i 1 ) is satisfied if aj + \ = 2r(n + \)aj for 
j G {1, . . . , n — 1}, that is 

aj = (2T(n + l)) j - 1 a 1 . 
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Now for (ii') to be satisfied, one can choose 

at = (2r(n + 1))"", 

so dj, for j G {2, . . . , n}, is determined by 

Oj = (2r(n+l))- n - 1+j . 

Then, since r > 2, we may choose 

6 = 3- 1 oi =3- 1 (2r(n + l))-" 

and (Hi 1 ) easily holds. Finally, we may also choose 

a = 6 = 3~ 1 (2r(n + l))- n 

so that (iv r ) is satisfied. With those values, it is easy to check that (V), {vi') 
and (vii') holds, recalling that r > 2 and n > 2. To conclude, just note that 
(viii 1 ), (ix'), (x') and (xi') are satisfied if e < eo with a sufficiently small eo 
depending on n, R, r, s, M, 7 and r. This ends the proof. □ 



A Proof of the normal form 

In this first appendix we will give the proof of the normal form 13.21 We 
will closely follow the method of |Pos99b] and deduce our result from an 
equivalent version in terms of vector fields (Proposition IA.41 below) . 

A.l Preliminary estimates 

Before giving the proof, we will need some general estimates based on the 
classical Cauchy inequality. 

1. First consider the case of a function / analytic on some domain T> rtS , 
and recall that 

\def\ r ,s = max \dej\ r ,s, \dif\ r , s = max \d h f\ r:S . 

l<t<n l<i<n 

We take r', s' such that < r' < r and < s' < s. The first estimate is 
classical, but we repeat the proof for convenience. 

Lemma A.l. Under the previous assumptions, we have 

\dlf\ r -r',s < —\f\r,s, \dof\r,s-s' < —\f\r,s- 
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Proof. For x = (0,1) 6 D r _ r ' jS and any unit vector v £ C n , consider the 
function 

F x ,„:tGC^ f(0,I + tv) G C. 

This function is well-defined and holomorphic on the disc \t\ < r', so the 
classical Cauchy estimate gives 

\K,M\ < jjl/k-, 

from which the inequality for dif follows easily by optimizing with respect 
to x and v. The estimate for dgf is completely similar. □ 

2. Now let j S (1, . . . , n}, and let / and g be analytic functions defined on 
the domain 

T> rj>Sj (uj) = {{0,1) e T> rj>s . I \Vh(I) - Uj \ <rj}, 

where u)j is a periodic vector. We can define a vector field norm on T> r ,s (wj) 
by 

\Xf\ rjtSj = max (\dif\ rjiSj ,\d e f\ rj;Sj ) . 
However, it will more convenient to use the following "weighted" norm 

H^/lk,^ = max {Idlflr^Sj, sir^ldeflr^sj) , 

since the components \dif\ rj)Sj and \dof\ rj)Sj may have very different sizes 
when estimated from the size of / by a Cauchy estimate (this idea is used 
in [DG96]). 

Remark A. 2. Note that under assumption (A), sir^ 1 > 1, so we have the 
inequality \Xf\ rjtSj < \\Xf\\ r . tS:j and the equality holds if f is integrable. 
Moreover, note that each norm \ \ . \ \ r:j ,sj is normalized with sir^ 1 (and not 
with SjrJ 1 ): by our inclusions of domains, this implies in particular that 

|| • ||rj + i,Sj+i < || • ||2rj/3,2sj/3- 

It is well-known how to use the Cauchy inequality to estimate the size 
of the Poisson bracket {/, g] in terms of / and g. Similarly, our second 
estimate is concerned with the size of the vector field [Xf,X g ] in terms of 
Xf and X g . We take r' , s' such that < r' < rj and < s' < Sj. 

Lemma A. 3. Under the previous assumptions, we have 

\\[ X fiX g ]\\ rj - r'.sj-s' < ^WXfWr^sjWXgWr^sj, 
and moreover, if g is integrable, then 

\\[ X fi X g\\\r j -r' > 8 j -8' < -pWXfWrj^WXgWrj. 
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Proof. First recall that 



t=o 



Now fix x S T> r ._ r i >a .— s i , and let us define the vector-valued function 

F x : t e C i— ► (*f)*X/(x) G C 2n . 
Clearly, the map is analytic, and it sends Xy, -r',s -s'O^j) into D rjtSj (u)j) 



for complex values of t satisfying 



\t\ < r | l-X^H^^ , 

hence the function F x is well-defined and analytic on the disc \t\ < r'\ \X g \ |~.\,_. . 
So applying the classical Cauchy estimate to each component of F x and op- 
timizing with respect to x 6 V rj - r r tS s i we obtain the desired inequality 

||[X/,J£ s ]|| rj ._ r / (aj ._ 5 / < — ||-X ff ||rj,« i ||-X'/||rj,aj- 

In case g is integrable, the map <3?f leaves invariant the action components, 
so the same reasoning can be applied on the larger disc 



\t\ < s | \X g \ \ r s , 



giving the improved estimate 



l|[^/)^fl]l|rj-r', Sj — a' < -p W^fWr^sj I \Xg\ \r s ■ 



□ 



A. 2 Proof of Proposition [3T21 



Now we can pass to the proof of Proposition 13.21 Given e > which will 
be the size of our perturbating vector field Xf, let us introduce a slightly 
modified set of conditions (Aj), for j £ {1, . . . , n}, where (Ai) is 



mT\e -<r\, mTiri ■< si, < r\< si, 
iBn(wi)^0, 



and for j £ {2, . . . , n}, (Aj) is 



imTji •< rj, mTjrj ■< Sj, < <• Sj, 
B rj (Wj) ^ 0, T> rjiaj (Uj) C 2?2r i _ 1 /3,2* j _i/3(Wi-l). 

These modifications take into account the fact that we will use the weighted 
norms || . || rj! ^, for j E {1, . . . ,n}. 

3. The normal form lemma in terms of vector fields is the following. 
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Proposition A. 4. Consider H = h + f on the domain P ri)Sl (cji), with 
\\Xf\\ ritSl < i, and let j G {l,...,n}. If (Ai) is satisfied for any i G 
{1, . . . then there exists an analytic symplectic transformation 

sitc/i i/iai 

HoVj = h + g j + f j , 
with {gj,k} = for i G {1, . . . , j}, and the estimates 

\\2rj/3,2 Sj /3 < 1 1 Xfj I \2rj/3,2 S j/3 < e £• 

Moreover, we have f j = $i o • ■ • o $j with 

*i : 2^4/3,2^/3 (Wt) -> 2\, Sl (^) 

suc/i i/iai |$i - Id| 2ri /3 i2si /3 < I- 

Let us see how this implies our Proposition 13, 21 

Proof of Proposition \3.Sk We know that \f\ r ,s < £, so we can apply Lemma rA.ll 
with r' = n and s' = sx to obtain 

and hence 

I \Xf | |r— ri,s— si ^ ?"l £• 

Now since r\ ■< r and si •< s (this is part of assumption {Ax)), we have the 
inclusion T> ritS1 {uJx) Q £> r -ri,s-si and hence 

ll^/lln.si <rx 1 e. 

Set i = e, then for any i G (Aj) implies (Aj) so that the 

Proposition IA.4I can be applied: there exists an analytic symplectic trans- 
formation 

*j : 2? 2 ^/3,2^/3(^) -»• 2? riiSl (wi) 

such that 

ffotj = h + gj +fj, 
with {^j, /j} = for i G {1, . . . , j}, and the estimates 

1 1 ^112^/3,2^/3 <erjf , 1 1^112^/3,2^/3 < e~ m rf x e. 
Recalling the definition of our norm || . | |r_,-,sj » this readily implies 

l<%j 12^/3,2^/3 <• esF 1 <• e > \9efj\2rj/3,2sj/3 < e^es^ 1 <■ e~ m £. 
Moreover, we have $j = $j o • • • o $j with 

*i : 2^2^/3,2^/3 P r llSl (^) 

such that |$j - Id| 2ri /3 )2si /3 < r». □ 
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4. Hence it remains to prove Proposition IA.4I This will be done by in- 
duction on j G {1, . . . , n}, and for that we shall need two iterative lemmas. 
The first iterative lemma is needed for the first step, that is to prove the 
statement for j = 1, and it can be seen as an averaging process with respect 
to one fast angle. 

Lemma A. 5 (First iterative lemma). Consider H = h+g+f on the domain 
T> ritS1 (ui), with h integrable, {g,h} = 0, and assume that 

l|-^sl|ri,si *5 1 1 l r l i s i ^ ^' 

If we have 

Tie < r' < s' 

with two real numbers r' , s' satisfying < r' < r\ and < s' < 3%, then 
there exists an analytic symplectic transformation 

fl '■ A-i-rVi-s'^l) ^ri,si(wi) 

such that \(pi — Id| ri _ r ' jSl _ s ' < T\i and 

H otpi = h + g + + / + , 
with {g + ,li} = and the estimates 

\\Xg^_ \\ri, si £■> ^gllrijSi < 1 |-^/+ I In— r',s\— s' ^"^7 "^1^" 

Proof. We have H = h + g + /, with h integrable, g satisfying {g, l\} and / 
a general term. Let us write 



[/]l = ^ fo^dt, 
J l Jo 



the average of / along the Hamiltonian flow of l\. 

Our transformation ip\ = will be the time-one map of the Hamilto- 
nian flow generated by some auxiliary function \ which satisfies 

{xM = f-[fh- 

The latter equation is easily solved by 

X = ^r / Tl (/-[/] 1 )o^Hdt, (24) 
J i Jo 

and by Taylor formula, our transformed Hamiltonian writes 

H o tpx = h + g + + f + , 
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with 



9+ = 9 + [f]l, U = f\h -h+g + ft,x}° $tdt, 
Jo 

and ft = tf + (1 — By construction, g + still satisfies {g + ,l\} = 0, and 



1 



x 9+ - x g = x [f]l = - (&rx f dt. 

Our hypothesis ||Xy|| rijSl < e immediately gives \\X g+ — X g \\ riiSl < e and 
also 11^+ 1 In, si <•£• Similarly using (fM|) we have the expression 



x x = ±£\^yx Mf]l tdt, 



and hence ||X x || riiS1 < T\i. By the hypothesis T\E < r' < s' our transfor- 
mation (fx maps T> ri _ r i S1 _ s i(u)\) into T> rijSl (u)i) and 

\<Pi ~ Id| ri - r ', Sl -s' < Tie. 
Therefore it remains to estimate the vector field 

x f+ = ! ($>mx h - h +X g + X ft ,X x ]dt, 
Jo 

and for that it is enough to estimate the brackets [Xf t ,X x ], LY ff ,X x ] and 
[Xh-i-t, X x ]. Using Lemma (|A.3j) . we find 

\\[Xft,X x }\\ ri -r', Sl -s' < —\\[X ft \\ ruS1 \\X x \\ ruSl <—Tie 

and 

\\[Xg,X x }\\ ri _ r i iSl - s r < —\\[X g \\ ri}S1 \\X x \\ riiS1 < —Tie. 

For the last bracket, note that h — li is integrable so that we can use the 
improved estimate in Lemma (|A,3p . By definition of the domain D rijSl (uii), 
we have ||X/ l _; 1 || n <-ri and hence 

1 Tl 

\\[Xh-h,X x \\\ ri _. r i jSl _ s i < — ||X/ l _/ 1 || ri ||X x || ri)Sl <• — Tie. 
Putting the last three estimates together we arrive at 

||X/ + || ri _ P / )8l _ a /< (j^ + ^j T ^- 

□ 

Our second iterative lemma is needed for the inductive step, that is to 
go from j to j + 1. This is just a simple extension of the previous one. Let 
j £ {1, . . . , n — 1}. 
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Lemma A. 6 (Second iterative lemma). Consider H = h + g + fon the 

domain V rj+ljSj+1 (ujj + i), with h integrable, {g, k} = for i G {1, ...,_? + 1} ; 
{/> k'} = for i' G {1, . . . , j}, and assume that 

\\X g \\ rj+1>Sj+1 <•£, ||^/||r J+ i,s J+ i <•£• 

// we have 

T j+1 i-<r' -<s' 

with two real numbers r' , s' satisfying < r' < r,-+i and < s' < Sj+i, then 
there exists an analytic symplectic transformation 

fj+l ■ ^r J+1 -r',s J+1 -s'{^j+l) -> D rj+1:Sj+1 (LOj + i) 

such that \ipj+i — Id| rj . +1 _ r ' jSj+1 _ s ' <Tj + i£ and 

Ho(p j+1 = h + g + + /+, 

with {g+,h} = /or i G {1, . . . , j + 1}, = for i! G {1, . . . , j}, and 

the estimates 



||^/ + ||r j+1 -r', S3 - + i- S ' < ( + -J ) T j+1 e. 



Proof. Our Hamiltonian is H = h+g+f, h is integrable and we have {g, l{\ = 
for i G {1, . . . , j + 1} and {/, /j'} = for i' G {1, . . . , j}. Once again, our 
transformation fj+i = will be the time-one map of the Hamiltonian flow 
generated by some auxiliary function x- 
We choose 

X = 7^- r + \f ~ [fli+l) ° ®t +1 tdt, (25) 
Jo 

where [-]j+i is the averaging along the Hamiltonian flow of Introducing 
the notation ft = tf + (1 — t)[/],-_|_i, like in Lemma I A. 51 we have 

H o tp j+1 = h + g + + f + 

with ^ 

9+=9+[f]j+i, /+=/ +9 + ft,x}°®t dt - 

Jo 

We need to verify that we still have {g+, k} = for i G {1, . . . ,j + 1} and 
{f+,k>} = for i' G {!,..., j}. By definition, lj+i} = 0, and for 
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i' € {1, . . . , j}, we compute 

{[f] j+ uk'} = 7^— r +1 {fo$ i r\ii'}dt 

Jo 

1 r T j+ i 



T j+i Jo 



Jo 



T j+i Jo 
= 0. 

This proves that {g+,k} = {g + [f]j+i, k} = for i e {1, . . . , j + 1}. Now 
a completely similar calculation shows that for i' £ {1, . . . , j}, {x, h*} = 0, 
hence o <E>* = 1% and therefore 

{/+, = [\{h - l j+ i +9 + ft, Xh U>} o $*dt. 
Jo 

The double bracket in the expression above is zero, as a consequence of 
Jacobi identity and the fact that {h — + g + ft, k>} = {Xj k'} = 0) hence 
{/+,/*,} =0 for i'e{l,...,j}. 

To conclude, using our hypothesis Tj + \e -<r' •< s' , as in Lemma IA.5I 
we can show that our transformation <pj+i maps T> T . +1 _ T i s . +1 _ s i{ujj + i) into 
T>r j+1)Sj+ i(f^j+l) witn \<Pj+l ~ Id\r j+1 -r',s j+1 -s' <Tj+i£ and the estimates 

||-Xff + ||rj+i,aj-+i \\Xg + ~ ^gWrj+itSj+i < £, 

\\Xf + \\r ]+1 -r>,s ]+1 ~s> < ( + ~i ) T j+1^ 



s r 

are obtained in a completely analogous way. □ 
5. We can eventually complete the proof of our normal form IA.41 



Proof of Proposition A.J^, The proof is by induction on j £ {1, . . . , n}. 

First step. Here we assume (A\) and we will apply m times our first 
iterative Lemma lA.51 starting with the Hamiltonian 

H° = H = h + g° + f 

where g° = and f° = f and choosing uniformly at each step 

r = (3m) _1 ri, s' = (3m) - si. 

Since m > 1, we have < r' < r%, < s' < s% and using (A\), we have 

T x e < r' < s', 
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so that the lemma can indeed be applied at each step. For i G {0, . . . , m— 1}, 
the Hamiltonian H l = h + g l + /* at step i is transformed into 

H i+1 = H l oy\ = h + g i+1 + f i+1 . 

For each i G {0, . . . ,m}, we obviously have {g l ,l±} = and we claim that 
the estimates 

ll-^llrj^l ll^/'llri^i < ^ii (26) 

hold true, where we have set Ej = e~ l e, r\ = r\ — ir' and s\ = si — is'. 
Assuming this claim, given i 6 {0, . . . , m — 1}, we have 

so that = (flo- ■ -oip" 1 ^ 1 is well defined from T^2r 1 /3,2s 1 /3i UJ i) to T> n>Sl (<^i)- 
Setting g\ = g m and f\ = f m , we finally obtain 

H o = h + 5i + /i 

with the desired properties, that is {gi, Zi} = and the estimates 

I \^gi I \ o-r, /3 9q, /S < 6, \\X 



2n/3,2si/3 < e ; I l^/i I hri/3,2si/3 < e e - 



Note that since ||-X/i|| r i )S i < e% for i G {0, . . . , m — 1}, we obtain 
which gives 

m— 1 
fc=0 

But recall that mT\£ ■< n and hence we can arrange 

1*1 - Id| 2 ri/3,2 Sl /3 -<n- 

Therefore to conclude the proof we need to establish the estimates (|26p , 
and we may proceed by induction. For i = 0, g° = and f° = f so there is 
nothing to prove. Now assume that the estimates (|26[) are satisfied for each 
k < i, where i G {0, ... ,m — 1}. For G {0, . . . since ||Xjfc|| r fc s k < £k 
we get that 

and therefore 

i 

ii^y+iii^+iy+i < y^gfc <•£, 

fc=0 
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so this gives the desired estimate for X g i+i. For Xp+i, note that 
ll- y / 4+l llr* + \** +1 <T (7 + 7) W X p\\r\,s^ 

but 

m ( T\ e\ ( mT\r\ mT\E 
Ti f 4 + - I =• I — — + 



s' r'y v si ri 



so choosing properly the implicit constants in (A\) we can ensure that 

\ C' f> I o 



s' r j e 

which implies the estimate for Xp+i and concludes this first step. 

Inductive step. Now assume that the statement holds true for some 
j € {1, . . . , n — 1}, and we have to show that it remains true for j + 1. By 
assumptions, there exists an analytic symplectic transformation 

*j : ^2r,/3,2 Sj /3(wj) -> 2?r 1)Sl (wi) 

such that 

fl-o^ = h + g j + f j , 
with {gfj, /j} = for i 6 {1, . . . , j} and the estimates 

1 1 Xgj 1 1 2rj /3,2s j /3 <" II -X/j 1 1 2rj /3,2s,- /3 <" e £• 

Also, $j = $1 o • ■ • o <]?j with 

$i : ^2^/3,2^/3 (^i) -> 2?r ilSi (^) 

such that |<J>j — Id|2 ri /3,2ai/3 '< r « f° r * e Furthermore, (Aj + i) 

holds. Now consider the Hamiltonian h + gj, it is defined on 'E>2r j /3,2sj/3( UJ j)i 
hence by (Aj + i), it is also defined on the domain T> r . +ltS . +1 (uj + i) and it 
satisfies {gj, l{\ = for i G {1, . . . ,j}. Moreover, we have the estimate 

\\ X g 3 \\r J+1 ,s 3 +i ^ \ \ X g J \\2r j /3,2s J /3< e - 

As in the first step, starting this time with the Hamiltonian 

h + g j = h + g° j + /?, 

with g°- = and /j 5 = (fy, we can apply m times our second iterative 
Lemma lA.6l to have the following: there exists an analytic symplectic trans- 
formation 

: ^2r J+1 /3,2 Sj+1 /3( w i+l) ->■ ^ > r J+1 ,s J+1 (Wi+l) 
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oftheform$ i+ i = (f° j+l o- ■ -ocpf +1 such that |^j+i-Id| 2r;;+1 /3,2 S;7+1 /3 •< r j+i 
and 

(h + g 3 )o<$> ]+l = h + gY + f™, 
with {gj 1 , h} = for i £ {1, . . . , j + 1}, and the estimates 

H-Xflf ||2rj +1 /3,2aj +1 /3 < I l^/f I \2r j+1 /3,2a j+1 /3 <e~ m E. 

Now we set 

= *J ° : ^2r J+1 /3,2 Sj+1 /3K+l) -> ^r llSl (wi), 

which is well-defined by (Aj + i), to have 

H o = (JJ o ^) o 

= (ft + ft- + fj) ° 

= (/l + 5i )o$ i+1 + / j o$ j+1 

= h • g? ■ f? ■ 

= ft + gj+i + fj+i 

with = gj 1 and /j+i = fj 1 + /j o $j\fi. The conclusions follow: 

{gj+i, k} = for i E {1, . . . , j + 1}, we have the estimate 

I l-^Sffi+l I ]2r i+ i/3,2s i+ i/3 < - ^) 

and since 

||^/ 3 o0j +1 ||2rj +1 /3,2 S j +1 /3 < I l^/j I \r ]+1 ,s j+1 < || 2^/3,2^/3 

we also have 

\\Xf J+1 \\ 2 r J+1 /3,2s J+1 /3 < 11^/^112^+1/3,2^+1/3 + 11^112^/3,2^/3 

<■ e- m s. 

The proof is therefore complete. □ 

B SDM functions 

In this appendix, we will study our class of SDM functions. We will first 
show in IB. II that they satisfy an adapted steepness property, which we used 
in the proof of our exponential estimates, and then in lB.2l we will prove that 
they are generic. These results are similar to [Nie07j . 
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B.l Steepness. 



1. We denote by GAs(n, k) the set of all affine subspaces of M. n of dimension 
k intersecting the ball B, and by GAg(n, k) those subspaces with direction 
in G L (n,k) (the latter is the space of linear subspaces of W 1 of dimension 
k whose orthogonal complement is spanned by integer vectors of length less 
than or equal to L). Let us recall the classical steepness condition, originally 
introduced by N.N. Nekhoroshev ( [Nek77] ). 

Definition B.l. A function h € C 2 {B) is said to be steep if it has no critical 
points and if for any k € {1, . . . ,n — 1}, there exist an index pk > and 
coefficients Ck > 0, 8k > such that for any affine subspace £ GAs(n, k) 
and any continuous curve T : [0, 1] — > \f~ f] B with 

||r(o)-r(i)|| =r<s k , 

there exists t* £ [0, 1] such that: 

||r(t)-r(o)|| <r, t€[o,u], 

\\U Ak (Vh(T(Q)) || >C k r^ 

where IlA fe is the projection onto the direction of Xk- 

The function is said to be symmetrically steep (or shortly S-steep) if 
the above property is also satisfied for k = n, with an index p n > and 
coefficients C n > 0, 5 n > 0. 

Let us remark that S-steep functions are allowed to have critical points. 
Those definitions are rather obscure, but in fact it can be given a simpler 
and more geometric interpretation, as was shown by Ilyashenko ( [Ily86] ) and 
Niederman ( [Nie06| ). Important examples of steep functions are given by 
the class of strictly convex (or quasi-convex) functions, with all the steepness 
indices equal to one. 

2. A typical example of non-steep function, which is due to Nekhoroshev, 
is h{I\ , I2) = I\ — 1% , and it is not exponentially stable: for the perturbation 
h e {I\,l2) = if — if + esin(Ji + I2), any solution with h(0) = ^2(0) has a 
fast drift, that is a drift of order one on a time scale of order e _1 (this is 
obviously the fastest drift possible). But adding a third order term in the 
previous example (for example if) we recover steepness, and this is in fact a 
general phenomenon. Indeed, non-steep functions has infinite codimension 
among smooth functions, or more precisely, if J r (n) is the space of r-jets of 
C°° functions on an open set of M n , then Nekhoroshev proved in |Nek79] 
that the set of r-jets of non-steep functions is an algebraic subset of J r (n) 
which codimension goes to infinity has r goes to infinity. In this sense, steep 
functions are "generic". However, for n > 3, a quadratic Hamiltonian is 
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steep only if it is sign definite, which is a strong assumption, and more gen- 
erally a polynomial is generically steep only if its degree is sufficiently high 
(of order n 2 if n is the number of degrees of freedom) . Hence polynomials of 
lower degree are generically non-steep (see [LM88J). This is clearly a short- 
coming, and we will see at the end of the next section the advantage of our 
genericity condition. 

3. Steepness (or S-steepness) is a sufficient condition to ensure exponential 
stability, but this is not necessary, as was first noticed by Morbidelli and 
Guzzo (see [MG96] ). They considered the Hamiltonian h{I\,l2) = If — 
al$, which is non-steep for any value of a > 0, and noticed that a "fast 
drift" is not possible if \fa is "strongly" irrational. Therefore a Diophantine 
condition on ^[a. should ensure exponential stability. 

Such considerations were then generalized by Niederman who introduced 
the class of "Diophantine Morse" functions and who proved that they are 
exponentially stable ( |Nie 07]). The only difference between these functions 
and the "Simultaneous Diophantine Morse" functions we use in this paper 
is that Diophantine Morse functions consider subspaces in Gi(n,k), which 
are generated by integer vectors of length bounded by L, while here we 
are looking at subspaces in G L (n,k) where the latter condition is imposed 
on the orthogonal complement. This reflects the difference between the 
method of proof: in ([NieOT]) the analytic part was based on classical small 
divisors techniques (that is linear Diophantine approximation) and therefore 
required an adapted geometric assumption, while here we simply rely on the 
most basic theorem of simultaneous Diophantine approximation (and this 
explains the name Simultaneous Diophantine Morse functions). 

4. In both cases, the use of such a class of functions has two advantages. 
The first one is that these functions are generic in a much more clearer sense 
than steep functions, and this will be explained in the next section. The 
second advantage is that they are in some sense more general than the usual 
steep functions, since we only have to consider curves in some specific affine 
subspaces. This is explained in the proposition below. 

Proposition B.2. Let h £ SDM^{B), assume that |^i|c 3 (s) < M and take 
r < 1. Then for any affine subspace A € GAg(n,k) and any continuous 
curve r : [0, 1] ->■ A D B with 

\\T(0)-T(l)\\=r<(2M)- 1 1 L-\ 

there exists t* £ [0, 1] such that: 

||r(t)-r(o)|| <r, te[o,u], 
||n A (vMr(t*)))|| > y 

where Ha is the projection onto A, the direction of X. 
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Proof. It is enough to check that these properties are satisfied for a vector 
space A G G L (n, k), since any affine subspace A G GAg(n, k) is of the form 
A = v + A with A £ G L (n, k) for some vector v. So consider a continuous 
curve r : [0, 1] — > A n B with length r < 1 satisfying 

||r(0) - r(l)|| = r < (2M)" 1 7 i~ T - 

We will denote by (a(t),/3) the coordinates of T(t) for t G [0, 1] in a basis 
adapted to the orthogonal decomposition A © A" 1 . Therefore 

||n A (v/i(r(t)))|| = \\d a h A (a(t),p)\\ 

for all t G [0, 1]. We will distinguish distinguish two cases. 
For the first one, we suppose that 

||0 a Ma(O),/3)|| >2-V, 

so the conclusion trivially holds for t* = 0. 
For the second one, we have 

\\d a h A (a(0),p)\\ <2~ 1 r 2 , (27) 

but since r 2 < r < ^L~ T , this gives 

||a a /i A (a(0),/3)|| <lL- T . 

Now h G SDM^(B), so we can apply the definition at the point (a(0),/3), 
and for any rj G M fc \ {0} we obtain 

\\d aa h A (a(0),P).rj\\ > 7 L- r ||r ? ||. (28) 

Take any a such that \\a — a(0)|| < (2M) _1 7L _T . We can apply Taylor 
formula with integral remainder to obtain 

d a h A (&,P)-d a h A (a(0),p)= [ d aa h A (a(0) + t(a-a(0)),p).(a-a(0))dt. 

Jo 

Now since M bounds the third derivative of h, we have 
\\d aa h A (a(0)+t(a-a(0)),p)-d aa h A (a(0),l3)\\ < Mt\\a-a(0)\\ < 2- 1 1 L~ T t, 
and this yields 

\\d a h A (a,p)-d a h A (a(0),P)\\ > ||a Qa /iA(a(0),/3).(a-a(0))|| 

-2- 1 7^ _r / t\\a - a(0)\\dt, 
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which in turns, using (|28p with 77 = a — a(0), gives 

\\d a h A (a,P)-d a h A (a(0),p)\\ > ^yL~ T — 2~ 1 ^L~ T tdt \ \\a — a(0) \\ 

> 2- 1 7 ^" T ||a-«(0)||. (29) 

Now we define 

U= inf {||r(t)-r(0)||=r}, 

*e[0,l] 

so trivially we have 

||r(i)-r(o)|| <r, te[o,u]. 

Furthermore, we have 

\\d a h A (a(U),p)\\ > \\d a h A (a(U),P)-d a h A (a(0),P)\\ - \\d a h A (a(0),P)\\, 

and so using (|2"7|) . (|29|) and recalling that ||a(i*) — a(0)|| = r and r yL~ T > 2r 
we obtain 

\\d a h A (a(Q,f3)\\ > 2- 1 1 L~ T r-2- 1 r 2 
> r 2 - 2~V 2 
= 2~V, 

and this is the desired estimate. □ 
B.2 Prevalence 

5. Here we will prove our results of genericity concerning SDM functions, 
that is Theorem [22] and Corollarv l2.3l Our main tool is the following lemma, 
which is proved in |Nie07] and relies on the quantitative Morse-Sard theory 
developed by Yomdin (see [YC04] and [Yom83| ) . Let us denote by the 
/c-dimensional Lebesgue measure. 

Lemma B.3. Let g G C 2n+1 (B,R k ). Then for any k g]0, 1[ there exist a 
subset C K C R k with 

Afe(C K ) < Cfcv 7 ^, 

where Ck only depends on k, such that for any ( £ C K , the function defined 
by g^(x) = g{x) — £ satisfies the following: for any x G B, 

11^(^)11 < k ==$■ \\dcft(x).v\\ > 

for any f£R"\{0}. 

In the above statement, the set C K is a "nearly-critical set" for the func- 
tion g. 

6. Let us prove Theorem 12.21 
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Proof of Theorem \2.^ Recall that we are given a function h G C 2n+2 (B). 
The proof is divided in two steps: first, we will describe the set of parameters 
£ G W 1 for which the function h%, defined by h^(I) = h(I) — £.J, is not in 
SDM T (B), and then, in a second step, we will show that this set has zero 
Lebesgue measure, for r > 2(n 2 + 1). In the sequel, given k G {1, ... ,n}, 
we denote by the Lebesgue measure of M. k . 

First step. Given an element A G G L (n,k), let Ha the projection onto 
this subspace and consider the associate function h\ (recall that h\ is just 
the function h written in coordinates adapted to the orthogonal decompo- 
sition A © A -1 ). Let us define the function 

g = d a h A , 

which belongs C 2n+1 (B, W k ), and apply to this function Lemma lB.31 with 
the value k = jL~ T . We find a "nearly-critical" set C K = Cj jTi l C M. k with 
the measure estimate 

A fc (C 7iT , L ) <c fc7 5L-§, (30) 
such that for any £ ^ C 1:Tt L and any (a, /3) G B, 

\\g<(a,P)\\ <k^ \\(rf(a,P).v\\ > k\\v\\, (31) 

for any v G R n \ {0}. 

Now take any ( £ C-y,r,L, any £ G II^ 1 (C) and consider the modified 
function fi£ as well as its version a- Since 

dah^A = d a h A - C = 9 ~ C = 

and d a>a h^A = d aj0l hA is just some restriction of dg, the estimate (f3Tj) gives 
for any (a, (3) G 5, 

\\d a h^ A (a,P)\\ < 7 ^ r ||a Q , a /i s , A (a, > jL- T \\ V \\ (32) 

for any r/ G M n \ {0}. So let C lyTt L,K = II^ 1 (C 7)Ti l) ) and define 

^7,t = U U U ^7. T . L . A - 

LGN* fce{l,...,n} AeG L (n,k) 

As a consequence of the estimate (|32p . the function ^ G SDM^(B) provided 
that £ G" C 7jT , hence /i£ G SDM 7 (B) provided that £ G" C T , where 

7>0 

Second step. It remains to prove that C r has zero Lebesgue measure 
under our assumption that r > 2(n 2 + 1). For an integer m G N*, we define 
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C™ tLA (resp. C™ T and C™) as the intersection of C 7iT ,l,a (resp. C 7)T and 
C T ) with the ball of R n of radius m centered at the origin. As a consequence 
of (1301) and Fubini-Tonelli theorem, one has 



1 T 

2 



where V^ m = m n 7r n / 2 r(n/2 + is the volume of the ball of W 1 of radius 
m centered at the origin. Therefore 

An (J C™ T;L;A I < \G L {n,k)\Vn,mC k L- T 2Y2, 

\AeG L (n,k) J 

with \G L (n,k)\ the cardinal of G L {n,k). But obviously \G L (n, k)\ < L™ 2 
and hence 

An I |J C™ TtL>A \ <V n , m c k L n2 -i^. 

\AeG L {n,k) J 

Now 



2_r I 
2 , 



An |J U C 7,r,L,A < V n ,m V^C k \ L n "5 7 

\/cS{l,...,n}AeG i (n,fc) / \fc=l / 

and so 

A„(C- ) < K, m (X> fe J (X> n2 ~ § ) 7* 

where the sum in the right-hand side of the last estimate is finite since we 
are assuming r > 2(n 2 + 1). This shows that 

A n (C) = inf A n (C™ T ) = 0, 

7>0 " 

and as C T = U m >i we finally obtain 

A„(C r ) = 0, 

and this concludes the proof. □ 



7. As we mentioned in the introduction, there is a notion of genericity 
in infinite dimensional vector spaces called prevalence, first introduced in a 
different setting by Christensen ([Chr73]) and rediscovered by Hunt, Sauer 
and Yorke ( [HSY92] . see also |()Y05j and [HKlOp . 
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Definition B.4. Let E be a completely metrizable topological vector space. 
A Borel subset S C E is said to be shy if there exists a Borel measure fi 
on E, with < /u(C) < oo for some compact set C C E, and such that 
fi(x + S) = for all x £ E. 

An arbitrary set is called shy if it is contained in a shy Borel subset, and 
finally the complement of a shy set is called prevalent. 

The following "genericity" properties are easy to check ([OY05J, [HK10J): 
a prevalent set is dense, a set containing a prevalent set is also prevalent, 
and prevalent sets are stable under translation and countable intersection. 

Furthermore, we have an easy but useful criterion for a set to be preva- 
lent. 

Proposition B.5 ([HSY92J). Let P be a subset of E. Suppose there exists 
a finite- dimensional subspace F of E such that x + P has full Xp-measure 
for all x 6 E. Then P is prevalent. 

8. Now we can prove our Corollary 12.31 

Proof of Corollary\2M Let E = C 2n+2 (B), P = SDM T (B) for r > 2(n 2 + 
1) and F the space of linear forms of W 1 restricted to B. Then F is a 
linear subspace of C 2n+2 {B) of dimension n, and the conclusion follows 
immediately from Theorem 12.21 and the above Proposition IB. 51 □ 

9. To conclude, let us compare our generic condition with the usual steep- 
ness property. First, our condition is prevalent in the space C k (B), with 
k > 2n + 2, and this is not true for steep functions. But more importantly, as 
prevalence is nothing but "full Lebesgue measure" in finite dimension, given 
any non zero integers m and n, Lebesgue almost all polynomial Hamiltonian 
h m of degree m with n degrees of freedom is SDM, but not steep unless m 
is of order n 2 . This remark turns out to be very useful when studying the 
stability of invariant tori under generic conditions (see |Bou09j ) . 
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